Attribute Information:
The inputs are as follows
X1=the transaction date (for example, 2013.250=2013 March, 2013.500=2013 June, etc.)
X2=the house age (unit: year)
X3=the distance to the nearest MRT station (unit: meter)
X4=the number of convenience stores in the living circle on foot (integer)
X5=the geographic coordinate, latitude. (unit: degree)
X6=the geographic coordinate, longitude. (unit: degree)
The output is as follow
Y= house price of unit area (10000 New Taiwan Dollar/Ping, where Ping is a local unit, 1 Ping = 3.3 meter squared)
Algunas librerias a utilizar:
library(caret)
library(tidyverse)
setwd('C:/Users/Oliver/Documents/9/TAE/reto 3 superficie de respuesta knn')
datos_completos <- readxl::read_xlsx('Real estate valuation data set (1).xlsx',
col_names = c('n', 'x1', 'x2', 'x3', 'x4', 'x5', 'x6', 'y'), skip = 1)
datos <- datos_completos %>%
select('x1', 'x2', 'x3', 'x4', 'y')
# un análisis rapido a lo que son los datos:
dim(datos)
## [1] 414 5
names(datos)
## [1] "x1" "x2" "x3" "x4" "y"
head(datos)
tail(datos)
summary(datos)
## x1 x2 x3 x4
## Min. :2013 Min. : 0.000 Min. : 23.38 Min. : 0.000
## 1st Qu.:2013 1st Qu.: 9.025 1st Qu.: 289.32 1st Qu.: 1.000
## Median :2013 Median :16.100 Median : 492.23 Median : 4.000
## Mean :2013 Mean :17.713 Mean :1083.89 Mean : 4.094
## 3rd Qu.:2013 3rd Qu.:28.150 3rd Qu.:1454.28 3rd Qu.: 6.000
## Max. :2014 Max. :43.800 Max. :6488.02 Max. :10.000
## y
## Min. : 7.60
## 1st Qu.: 27.70
## Median : 38.45
## Mean : 37.98
## 3rd Qu.: 46.60
## Max. :117.50
str(datos_completos)
## tibble [414 x 8] (S3: tbl_df/tbl/data.frame)
## $ n : num [1:414] 1 2 3 4 5 6 7 8 9 10 ...
## $ x1: num [1:414] 2013 2013 2014 2014 2013 ...
## $ x2: num [1:414] 32 19.5 13.3 13.3 5 7.1 34.5 20.3 31.7 17.9 ...
## $ x3: num [1:414] 84.9 306.6 562 562 390.6 ...
## $ x4: num [1:414] 10 9 5 5 5 3 7 6 1 3 ...
## $ x5: num [1:414] 25 25 25 25 25 ...
## $ x6: num [1:414] 122 122 122 122 122 ...
## $ y : num [1:414] 37.9 42.2 47.3 54.8 43.1 32.1 40.3 46.7 18.8 22.1 ...
table(datos_completos$x4)
##
## 0 1 2 3 4 5 6 7 8 9 10
## 67 46 24 46 31 67 37 31 30 25 10
# normalizando los datos para evitar confusiones por la diferencia de las escalas al modelar
# y más presiscion en la variabildad del error de validacion. Extraigo media desviacion:
datoc <- scale(datos[,c("x1", "x2", "x3", "x4", "y" )], center = T, scale = T)
centro<-attr(datoc,"center")
escala<-attr(datoc,"scale")
datos<-as.data.frame(datoc)
Para esto se seleccionan todas las variables, se utiliza validacion cruzada repetida 3 veces con 10 subconjuntos, y se obtiene primero el k optimo y luego el resumen de criterios de seleccion:
# Se usará CV repetido 3 veces con k-folds=10:
trctrl <- trainControl(method = "repeatedcv", number = 10, repeats = 3)
# seleccion de k optimo con 4 variables usanco 3 CV con k=10, y con k vecinos de 1-30:
knn_fit <- train(y ~., data = datos, method = "knn",
trControl=trctrl,
preProcess = c( "knnImpute"),
tuneGrid = expand.grid(k = 1:30))
#Su resultado en los diferentes criterios de seleccion y el k vecimo con mejor resultado:
knn_fit$bestTune
knn_fit
## k-Nearest Neighbors
##
## 414 samples
## 4 predictor
##
## Pre-processing: nearest neighbor imputation (4), centered (4), scaled (4)
## Resampling: Cross-Validated (10 fold, repeated 3 times)
## Summary of sample sizes: 372, 373, 374, 373, 371, 371, ...
## Resampling results across tuning parameters:
##
## k RMSE Rsquared MAE
## 1 0.8006427 0.4882675 0.5071323
## 2 0.7173713 0.5387820 0.4757051
## 3 0.6691660 0.5829475 0.4566476
## 4 0.6388517 0.6074202 0.4421585
## 5 0.6335631 0.6093243 0.4461503
## 6 0.6337376 0.6066427 0.4459074
## 7 0.6277840 0.6133057 0.4434905
## 8 0.6254149 0.6158372 0.4423897
## 9 0.6252515 0.6151415 0.4445268
## 10 0.6277598 0.6114284 0.4453491
## 11 0.6280838 0.6107833 0.4456565
## 12 0.6272345 0.6122427 0.4444843
## 13 0.6290459 0.6101078 0.4457232
## 14 0.6299378 0.6089757 0.4458351
## 15 0.6300769 0.6088571 0.4465984
## 16 0.6286310 0.6108917 0.4465676
## 17 0.6292418 0.6101910 0.4476461
## 18 0.6299073 0.6101140 0.4489681
## 19 0.6283603 0.6132352 0.4475913
## 20 0.6276996 0.6145322 0.4477876
## 21 0.6278042 0.6149987 0.4471886
## 22 0.6287392 0.6138429 0.4474194
## 23 0.6286917 0.6143171 0.4476490
## 24 0.6289599 0.6145361 0.4481139
## 25 0.6303316 0.6132867 0.4494743
## 26 0.6319519 0.6115863 0.4504319
## 27 0.6342660 0.6093138 0.4519364
## 28 0.6354956 0.6078981 0.4529432
## 29 0.6359885 0.6080544 0.4527112
## 30 0.6382790 0.6056096 0.4553879
##
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was k = 9.
Para este caso, simplemente se utilizara la matris de varianzas covarianzas:
# RemoveRedundant Features
correlationMatrix <- cor(datos[,1:4])
# find attributes that are highly corrected (ideally >0.75)
highlyCorrelated <- findCorrelation(correlationMatrix, cutoff=0.6)
# print indexes of highly correlated attributes
print(highlyCorrelated)
## [1] 3
La caracteristica x3 podria considerarse descartarla, por mostrar correlacion -0.602519145 con x4, pero no supera limite que seria 0.75%, para descartar.
Se realizaron 6 modelos con las conbinaciones de variables se obtendran la lista de medidas de error para seleccion de mejor modelo, el mejor k y la grafica que muestra los k vs MSE: :
Este modelo contiene \(x1 \quad x2\) que es la fecha y la edad de la vivienda,
knn_fit1 <- train(y ~., data = datos[,c(1,2,5)], method = "knn",
trControl=trctrl,
preProcess = c( "knnImpute"),
tuneGrid = expand.grid(k = 1:30))
knn_fit1
## k-Nearest Neighbors
##
## 414 samples
## 2 predictor
##
## Pre-processing: nearest neighbor imputation (2), centered (2), scaled (2)
## Resampling: Cross-Validated (10 fold, repeated 3 times)
## Summary of sample sizes: 374, 372, 372, 373, 372, 373, ...
## Resampling results across tuning parameters:
##
## k RMSE Rsquared MAE
## 1 1.2014363 0.06386558 0.9281103
## 2 1.0998786 0.07568688 0.8777499
## 3 1.0332487 0.09571103 0.8292164
## 4 1.0012218 0.10304622 0.8092729
## 5 0.9721354 0.12248939 0.7927042
## 6 0.9521052 0.13501577 0.7812509
## 7 0.9269530 0.16427617 0.7578738
## 8 0.9212295 0.16724910 0.7523449
## 9 0.9203969 0.16553200 0.7519513
## 10 0.9174246 0.16803883 0.7485100
## 11 0.9191615 0.16289694 0.7499328
## 12 0.9217200 0.15759610 0.7496515
## 13 0.9191882 0.16214017 0.7435085
## 14 0.9206462 0.15785317 0.7430251
## 15 0.9184748 0.16186967 0.7407515
## 16 0.9210511 0.15734498 0.7431701
## 17 0.9209989 0.15724939 0.7442800
## 18 0.9190017 0.16137771 0.7440804
## 19 0.9195514 0.16004081 0.7446788
## 20 0.9187419 0.16102346 0.7437728
## 21 0.9163529 0.16446204 0.7406460
## 22 0.9147314 0.16651466 0.7374160
## 23 0.9149852 0.16645358 0.7350820
## 24 0.9151201 0.16562691 0.7349973
## 25 0.9152559 0.16429431 0.7346025
## 26 0.9149201 0.16529079 0.7335555
## 27 0.9159015 0.16335138 0.7349590
## 28 0.9149876 0.16443964 0.7339200
## 29 0.9144716 0.16518572 0.7340239
## 30 0.9133628 0.16743740 0.7345480
##
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was k = 30.
knn_fit1$bestTune
plot(knn_fit1)
Este modelo contiene \(x1 \quad x3\) son la fecha y la distancia a la estacion mas cercana:
knn_fit2 <- train(y ~., data = datos[,c(1,3,5)], method = "knn",
trControl=trctrl,
preProcess = c("knnImpute"),
tuneGrid = expand.grid(k = 1:30))
knn_fit2
## k-Nearest Neighbors
##
## 414 samples
## 2 predictor
##
## Pre-processing: nearest neighbor imputation (2), centered (2), scaled (2)
## Resampling: Cross-Validated (10 fold, repeated 3 times)
## Summary of sample sizes: 373, 372, 372, 373, 373, 372, ...
## Resampling results across tuning parameters:
##
## k RMSE Rsquared MAE
## 1 0.8146788 0.4772282 0.5687585
## 2 0.7282263 0.5097028 0.5380944
## 3 0.7073080 0.5263099 0.5086082
## 4 0.6941858 0.5411176 0.4977764
## 5 0.6740585 0.5573220 0.4850428
## 6 0.6634569 0.5660128 0.4809047
## 7 0.6551975 0.5749496 0.4747094
## 8 0.6502581 0.5804701 0.4693092
## 9 0.6466050 0.5854894 0.4656072
## 10 0.6458026 0.5866665 0.4681836
## 11 0.6500215 0.5813941 0.4708590
## 12 0.6522548 0.5783847 0.4724740
## 13 0.6522494 0.5787401 0.4720805
## 14 0.6552570 0.5747583 0.4728879
## 15 0.6566907 0.5727105 0.4740091
## 16 0.6573500 0.5714565 0.4740474
## 17 0.6584269 0.5698987 0.4747750
## 18 0.6568674 0.5722389 0.4734429
## 19 0.6583392 0.5701333 0.4739853
## 20 0.6580490 0.5707972 0.4736735
## 21 0.6598254 0.5684459 0.4757288
## 22 0.6603250 0.5681418 0.4762749
## 23 0.6606551 0.5685454 0.4762349
## 24 0.6608604 0.5693304 0.4761891
## 25 0.6595726 0.5715128 0.4757649
## 26 0.6592928 0.5724495 0.4763635
## 27 0.6602296 0.5719067 0.4770055
## 28 0.6603806 0.5723777 0.4782354
## 29 0.6607952 0.5727561 0.4787423
## 30 0.6627074 0.5703096 0.4807798
##
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was k = 10.
knn_fit2$bestTune
plot(knn_fit2)
Este modelo contiene \(x1 \quad x4\) quienes son la fecha de transaccion y el numero de tienda de coveniencia:
knn_fit3 <- train(y ~., data = datos[,c(1,4,5)], method = "knn",
trControl=trctrl,
preProcess = c("knnImpute"),
tuneGrid = expand.grid(k = 1:30))
knn_fit3
## k-Nearest Neighbors
##
## 414 samples
## 2 predictor
##
## Pre-processing: nearest neighbor imputation (2), centered (2), scaled (2)
## Resampling: Cross-Validated (10 fold, repeated 3 times)
## Summary of sample sizes: 373, 371, 373, 373, 371, 373, ...
## Resampling results across tuning parameters:
##
## k RMSE Rsquared MAE
## 1 0.9603181 0.2149786 0.7131368
## 2 0.9284028 0.2299306 0.6941330
## 3 0.8617405 0.2909020 0.6537014
## 4 0.8406814 0.3169460 0.6330827
## 5 0.8260652 0.3352410 0.6183191
## 6 0.8134487 0.3523661 0.6093366
## 7 0.8120919 0.3530736 0.6105054
## 8 0.8108423 0.3537278 0.6095059
## 9 0.8093801 0.3542346 0.6103979
## 10 0.8075009 0.3562714 0.6074556
## 11 0.8040508 0.3611332 0.6044225
## 12 0.8015538 0.3652740 0.6010854
## 13 0.8014128 0.3662174 0.6006616
## 14 0.8019617 0.3654012 0.6018455
## 15 0.8024146 0.3632519 0.6033059
## 16 0.8021896 0.3638336 0.6029058
## 17 0.8019623 0.3638650 0.6022095
## 18 0.8011548 0.3650431 0.6008924
## 19 0.8001597 0.3659560 0.5999592
## 20 0.7998914 0.3661064 0.6001544
## 21 0.7981622 0.3692635 0.6005220
## 22 0.7966635 0.3717861 0.5999952
## 23 0.7951226 0.3741693 0.5998670
## 24 0.7935436 0.3766966 0.5991525
## 25 0.7936396 0.3764988 0.5992040
## 26 0.7934983 0.3774351 0.5992142
## 27 0.7938036 0.3771365 0.5984205
## 28 0.7938205 0.3775952 0.5978549
## 29 0.7953468 0.3757446 0.5977999
## 30 0.7950621 0.3762792 0.5973261
##
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was k = 26.
knn_fit3$bestTune
plot(knn_fit3)
Este modelo contiene \(x2 \quad x3\), son la edad de la vivienda y la distancia a la estacion mas cercana:
knn_fit4 <- train(y ~., data = datos[,c(2,3,5)], method = "knn",
trControl=trctrl,
preProcess = c("knnImpute"),
tuneGrid = expand.grid(k = 1:30))
knn_fit4
## k-Nearest Neighbors
##
## 414 samples
## 2 predictor
##
## Pre-processing: nearest neighbor imputation (2), centered (2), scaled (2)
## Resampling: Cross-Validated (10 fold, repeated 3 times)
## Summary of sample sizes: 373, 373, 373, 372, 372, 372, ...
## Resampling results across tuning parameters:
##
## k RMSE Rsquared MAE
## 1 0.6779796 0.6046783 0.4364273
## 2 0.6062891 0.6547939 0.4004069
## 3 0.5964975 0.6637222 0.3935846
## 4 0.5855952 0.6710949 0.3962406
## 5 0.5802261 0.6728412 0.3981383
## 6 0.5788448 0.6726032 0.3998233
## 7 0.5801730 0.6697664 0.4001744
## 8 0.5873903 0.6601935 0.4060002
## 9 0.5966296 0.6494412 0.4115472
## 10 0.6003080 0.6436738 0.4143625
## 11 0.6027932 0.6398162 0.4174723
## 12 0.6048500 0.6365005 0.4185478
## 13 0.6059422 0.6344401 0.4186361
## 14 0.6073554 0.6324337 0.4182949
## 15 0.6126021 0.6258508 0.4214043
## 16 0.6140236 0.6233158 0.4243967
## 17 0.6154903 0.6214856 0.4248947
## 18 0.6165370 0.6198429 0.4250147
## 19 0.6198005 0.6157395 0.4278172
## 20 0.6225504 0.6123909 0.4298867
## 21 0.6250641 0.6093549 0.4312658
## 22 0.6271722 0.6068576 0.4323722
## 23 0.6285783 0.6050569 0.4328658
## 24 0.6298062 0.6034751 0.4329338
## 25 0.6310454 0.6022888 0.4342398
## 26 0.6333726 0.5993215 0.4361513
## 27 0.6346202 0.5976564 0.4375531
## 28 0.6353271 0.5969520 0.4388689
## 29 0.6367628 0.5952756 0.4412017
## 30 0.6380712 0.5937609 0.4436360
##
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was k = 6.
knn_fit4$bestTune
plot(knn_fit4)
Este modelo contiene \(x2 \quad x4\), son edad de vivienda y tiendas de conveniencia:
knn_fit5 <- train(y ~., data = datos[,c(2,4,5)], method = "knn",
trControl=trctrl,
preProcess = c("knnImpute"),
tuneGrid = expand.grid(k = 1:30))
knn_fit5
## k-Nearest Neighbors
##
## 414 samples
## 2 predictor
##
## Pre-processing: nearest neighbor imputation (2), centered (2), scaled (2)
## Resampling: Cross-Validated (10 fold, repeated 3 times)
## Summary of sample sizes: 371, 373, 372, 374, 372, 372, ...
## Resampling results across tuning parameters:
##
## k RMSE Rsquared MAE
## 1 0.7941620 0.4860517 0.5335523
## 2 0.7283709 0.5116053 0.5038691
## 3 0.7209468 0.5153381 0.4915121
## 4 0.7233108 0.5098283 0.4968245
## 5 0.7190457 0.5156791 0.4952390
## 6 0.7115733 0.5195778 0.4955930
## 7 0.7118602 0.5150910 0.4983763
## 8 0.7115441 0.5139492 0.5035127
## 9 0.7098944 0.5143357 0.5063993
## 10 0.7094687 0.5135527 0.5100065
## 11 0.7064946 0.5153350 0.5104105
## 12 0.7083284 0.5113761 0.5130074
## 13 0.7047724 0.5151308 0.5110381
## 14 0.7005023 0.5196524 0.5118062
## 15 0.6992040 0.5204712 0.5117025
## 16 0.6981346 0.5213023 0.5123603
## 17 0.6972081 0.5224729 0.5133877
## 18 0.6961799 0.5232387 0.5126466
## 19 0.6961694 0.5233630 0.5113874
## 20 0.6961142 0.5234840 0.5111524
## 21 0.6967618 0.5220646 0.5128291
## 22 0.6969474 0.5219449 0.5146789
## 23 0.6966658 0.5224028 0.5171892
## 24 0.6958560 0.5232891 0.5175484
## 25 0.6960273 0.5234671 0.5186474
## 26 0.6950116 0.5248487 0.5185443
## 27 0.6957288 0.5241595 0.5192062
## 28 0.6961582 0.5233361 0.5195045
## 29 0.6974086 0.5212987 0.5208973
## 30 0.6980221 0.5204005 0.5216840
##
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was k = 26.
knn_fit5$bestTune[[1]]
## [1] 26
plot(knn_fit5)
Este modelo contiene \(x3 \quad x4\), son distancia a la estacion mas cercana y tiendas de conveniencia:
knn_fit6 <- train(y ~., data = datos[,c(3,4,5)], method = "knn",
trControl=trctrl,
preProcess = c("knnImpute"),
tuneGrid = expand.grid(k = 1:30))
knn_fit6
## k-Nearest Neighbors
##
## 414 samples
## 2 predictor
##
## Pre-processing: nearest neighbor imputation (2), centered (2), scaled (2)
## Resampling: Cross-Validated (10 fold, repeated 3 times)
## Summary of sample sizes: 372, 371, 371, 373, 374, 374, ...
## Resampling results across tuning parameters:
##
## k RMSE Rsquared MAE
## 1 0.6464270 0.6264692 0.4235647
## 2 0.6307216 0.6294148 0.4288128
## 3 0.6257927 0.6357139 0.4303352
## 4 0.6115260 0.6410582 0.4264299
## 5 0.6090733 0.6393778 0.4301734
## 6 0.6144900 0.6338717 0.4346380
## 7 0.6304694 0.6141804 0.4405055
## 8 0.6287945 0.6142301 0.4401579
## 9 0.6304924 0.6127300 0.4436554
## 10 0.6341234 0.6081175 0.4479324
## 11 0.6359197 0.6058804 0.4497993
## 12 0.6432751 0.5983619 0.4557168
## 13 0.6461530 0.5953084 0.4602024
## 14 0.6514271 0.5896653 0.4644482
## 15 0.6536482 0.5874107 0.4682819
## 16 0.6572694 0.5840934 0.4710225
## 17 0.6574170 0.5824101 0.4713523
## 18 0.6581613 0.5806927 0.4699214
## 19 0.6582972 0.5790017 0.4695254
## 20 0.6576748 0.5784395 0.4707329
## 21 0.6568011 0.5794962 0.4707687
## 22 0.6551140 0.5811893 0.4697476
## 23 0.6563809 0.5797105 0.4708306
## 24 0.6565428 0.5790174 0.4707390
## 25 0.6568729 0.5782917 0.4704014
## 26 0.6571568 0.5766005 0.4701272
## 27 0.6536598 0.5812845 0.4667442
## 28 0.6525734 0.5825621 0.4660138
## 29 0.6545386 0.5795937 0.4675761
## 30 0.6553753 0.5780943 0.4690605
##
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was k = 5.
knn_fit6$bestTune
plot(knn_fit6)
Con esto se puede resumir el codigo hecho, primero se observa el par de variables, luego sus criterios para seleccion de varaibles:
for (i in list(knn_fit1,knn_fit2,knn_fit3,knn_fit4,knn_fit5,knn_fit6)){
print(i$coefnames)
print(i$results %>% filter(k==i$bestTune[[1]]))
}
## [1] "x1" "x2"
## k RMSE Rsquared MAE RMSESD RsquaredSD MAESD
## 1 30 0.9133628 0.1674374 0.734548 0.1233471 0.08345744 0.07129405
## [1] "x1" "x3"
## k RMSE Rsquared MAE RMSESD RsquaredSD MAESD
## 1 10 0.6458026 0.5866665 0.4681836 0.1486156 0.09797457 0.0746464
## [1] "x1" "x4"
## k RMSE Rsquared MAE RMSESD RsquaredSD MAESD
## 1 26 0.7934983 0.3774351 0.5992142 0.1683307 0.120036 0.07136043
## [1] "x2" "x3"
## k RMSE Rsquared MAE RMSESD RsquaredSD MAESD
## 1 6 0.5788448 0.6726032 0.3998233 0.1490725 0.1154914 0.05382897
## [1] "x2" "x4"
## k RMSE Rsquared MAE RMSESD RsquaredSD MAESD
## 1 26 0.6950116 0.5248487 0.5185443 0.1893983 0.1432126 0.07771896
## [1] "x3" "x4"
## k RMSE Rsquared MAE RMSESD RsquaredSD MAESD
## 1 5 0.6090733 0.6393778 0.4301734 0.166965 0.131845 0.06491474
segun estos resultados, fijandonos primeramente en RMSE las variables que optimizan con menor resupesta son x2 y x3:
se grafica las superficie de respuesta
medias<-knn_fit4$preProcess$mean
dsv_es<-knn_fit4$preProcess$std
x2 <- seq(min(datos$x2), max(datos$x2), length.out = 100)
x3 <- seq(min(datos$x3), max(datos$x3), length.out = 100)
test.df<-expand.grid(x2,x3)
names(test.df)<-c("x2","x3")
test_pred <- predict(knn_fit4, newdata = test.df)
test.df$y <- test_pred
z <- matrix(test_pred,ncol=length(x3),nrow = length(x2))
persp(x2,x3,z,xlab="Edad vivienda",ylab="Distancia",zlab="Precio",
main="Superficie de respuesta para un modelo con dos variables",theta=135,shade=0.3)
Con plotly se ve mejor:
library(plotly)
## Warning: package 'plotly' was built under R version 4.0.4
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
p <- plot_ly(x=x2,y=x3,z = z)
p <- add_surface(p)
p<-layout(p,title='Precio vs dis MRT y Edad de viviend',
xaxis=list(title="Edad"),
yaxis=list(title="Distancia al MRT (m)"))
p
Aqui se observa donde es valida la superficie de respuesta:
ggplot(datos_completos, aes(x=x2, y=x3)) +
geom_point()+
labs(title="Antiguedad vs Distancia", x= 'Edad de Vivienda', y = "Distancia")+
theme_light()